Outline

  • Introduction

  • Materials and Methods

  • Results

  • Discussion

  • Conclusion

Introduction

  • Project aim

Introduction

Materials and Methods

  • Obtain data set

  • Data Wrangling

  • Exploratory Data Analysis

  • Analysis and PCA Modeling

  • Logistic Regression model and Machine Learning

  • Shiny App

  • Working collaboratively using RStudio Cloud and Github

Materials and Methods

    <<<<<<< HEAD
  • Include flow chart diagram
  • =======
  • Include flow chart diagram

  • Talk about different methods

  • >>>>>>> baeb9d3bb0bd3723bbed5281c700d31a71ba050f

Results: Exploratory Data Analysis

Results: EDA (contd.)

Results: EDA (contd.)

Results: EDA (contd.)

Results: Analysis and Modeling

Results: Analysis and Modeling

Data is well seperated so classification seems to be feasible.

Data is well seperated so classification seems to be feasible.

Results: Analysis and Modeling

  • Logistic regression done using tidymodels

  • Perform binary classification

  • Parameters: other_diseases, height, weight, famHist2DBin, famHist1DBin

  • Assessing the predictive ability of the model

Results: Analysis and Modeling

Results: Analysis and Modeling

Discussion

  • Limited by the data set: location, race and habitat of source data limit the global usability of the model

  • Unique observation: Height seems to impact the likelihood of diabetes

  • The accuracy of our model can be increased with added parameters and data points

  • Scope for cross platforming and integrated studies

Conclusion

  • It was feasible to do data analysis and obtain biological insights about our data set

  • We conclude that height and weight are important indicators of T1 diabetes

  • We expected family history to be more important

  • More descriptive data would have made it easier to conclude and test hypotheses